Graph Neural Networks (GNNs) had been demonstrated to be inherently susceptible to the problems of over-smoothing and over-squashing. These issues prohibit the ability of GNNs to model complex graph interactions by limiting their effectiveness at taking into account distant information. Our study reveals the key connection between the local graph geometry and the occurrence of both of these issues, thereby providing a unified framework for studying them at a local scale using the Ollivier's Ricci curvature. Based on our theory, a number of principled methods are proposed to alleviate the over-smoothing and over-squashing issues.
translated by 谷歌翻译
Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.
translated by 谷歌翻译
Inverse medium scattering solvers generally reconstruct a single solution without an associated measure of uncertainty. This is true both for the classical iterative solvers and for the emerging deep learning methods. But ill-posedness and noise can make this single estimate inaccurate or misleading. While deep networks such as conditional normalizing flows can be used to sample posteriors in inverse problems, they often yield low-quality samples and uncertainty estimates. In this paper, we propose U-Flow, a Bayesian U-Net based on conditional normalizing flows, which generates high-quality posterior samples and estimates physically-meaningful uncertainty. We show that the proposed model significantly outperforms the recent normalizing flows in terms of posterior sample quality while having comparable performance with the U-Net in point estimation.
translated by 谷歌翻译
Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese.
translated by 谷歌翻译
Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experiments on the VieCap4H dataset show that our proposed method significantly outperforms its original structure on both the public test and private test of the Image Captioning shared task held by VLSP.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
We introduce efficient deep learning-based methods for legal document processing including Legal Document Retrieval and Legal Question Answering tasks in the Automated Legal Question Answering Competition (ALQAC 2022). In this competition, we achieve 1\textsuperscript{st} place in the first task and 3\textsuperscript{rd} place in the second task. Our method is based on the XLM-RoBERTa model that is pre-trained from a large amount of unlabeled corpus before fine-tuning to the specific tasks. The experimental results showed that our method works well in legal retrieval information tasks with limited labeled data. Besides, this method can be applied to other information retrieval tasks in low-resource languages.
translated by 谷歌翻译
成功的人工智能系统通常需要大量标记的数据来从文档图像中提取信息。在本文中,我们研究了改善人工智能系统在理解文档图像中的性能的问题,尤其是在培训数据受到限制的情况下。我们通过使用加强学习提出一种新颖的填充方法来解决问题。我们的方法将信息提取模型视为策略网络,并使用策略梯度培训来更新模型,以最大程度地提高补充传统跨凝结损失的综合奖励功能。我们使用标签和专家反馈在四个数据集上进行的实验表明,我们的填充机制始终提高最先进的信息提取器的性能,尤其是在小型培训数据制度中。
translated by 谷歌翻译
我们提出了一个数据收集和注释管道,该数据从越南放射学报告中提取信息,以提供胸部X射线(CXR)图像的准确标签。这可以通过注释与其特有诊断类别的数据相匹配,这些数据可能因国家而异。为了评估所提出的标签技术的功效,我们构建了一个包含9,752项研究的CXR数据集,并使用该数据集的子集评估了我们的管道。以F1得分为至少0.9923,评估表明,我们的标签工具在所有类别中都精确而始终如一。构建数据集后,我们训练深度学习模型,以利用从大型公共CXR数据集传输的知识。我们采用各种损失功能来克服不平衡的多标签数据集的诅咒,并使用各种模型体系结构进行实验,以选择提供最佳性能的诅咒。我们的最佳模型(CHEXPERT-FRECTER EDIDENENET-B2)的F1得分为0.6989(95%CI 0.6740,0.7240),AUC为0.7912,敏感性为0.7064,特异性为0.8760,普遍诊断为0.8760。最后,我们证明了我们的粗分类(基于五个特定的异常位置)在基准CHEXPERT数据集上获得了可比的结果(十二个病理),以进行一般异常检测,同时在所有类别的平均表现方面提供更好的性能。
translated by 谷歌翻译
药物误解是可能导致对患者造成不可预测后果的风险之一。为了减轻这种风险,我们开发了一个自动系统,该系统可以正确识别移动图像中的药丸的处方。具体来说,我们定义了所谓的药丸匹配任务,该任务试图匹配处方药中药丸所拍摄的药丸的图像。然后,我们提出了PIMA,这是一种使用图神经网络(GNN)和对比度学习来解决目标问题的新方法。特别是,GNN用于学习处方中文本框之间的空间相关性,从而突出显示带有药丸名称的文本框。此外,采用对比度学习来促进药丸名称的文本表示与药丸图像的视觉表示之间的跨模式相似性的建模。我们进行了广泛的实验,并证明PIMA在我们构建的药丸和处方图像的现实数据集上优于基线模型。具体而言,与其他基线相比,PIMA的准确性从19.09%提高到46.95%。我们认为,我们的工作可以为建立新的临床应用并改善药物安全和患者护理提供新的机会。
translated by 谷歌翻译